MANCHESTER UNITED SEASONS 2014 TO 2021

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
In [16]:
def man_u(row):
    if row['HomeTeam']== "Man United":
        return row['FTHG']
    else:
        return row['FTAG']

df['MU_goals'] = df.apply(man_u,axis=1)
In [80]:
team ="Man United"
df["Points"]=0
In [90]:
def calpoints(row):
    if row['HomeTeam']==team and row['FTR']=='H':
        return 3
    elif row['AwayTeam']==team and row['FTR']=='A':
        return 3
    elif row['FTR']=='D':
        return 1
    return 0
df["Points"] = df.apply(calpoints,axis=1)

df.to_csv("14-21.csv",index=False)
        
In [2]:
df = pd.read_csv("14-21.csv")
df.head()
Out[2]:
Season Date HomeTeam AwayTeam FTHG FTAG FTR HTHG HTAG HS ... HF AF HC AC HY AY HR AR Points MU_goals
0 14-15 8/16/2014 Man United Swansea 1 2 A 0 1 14 ... 14 20 4 0 2 4 0 0 0 1
1 14-15 8/24/2014 Sunderland Man United 1 1 D 1 1 11 ... 10 15 4 4 0 2 0 0 1 1
2 14-15 8/30/2014 Burnley Man United 0 0 D 0 0 9 ... 10 14 3 6 2 2 0 0 1 0
3 14-15 9/14/2014 Man United QPR 4 0 H 3 0 19 ... 11 8 3 1 1 0 0 0 3 4
4 14-15 9/21/2014 Leicester Man United 5 3 H 1 2 15 ... 11 9 2 4 1 1 0 1 0 3

5 rows × 23 columns

In [51]:
 
In [ ]:
 
In [107]:
df.drop(columns = ['Referee'],inplace =True)
In [14]:
df.head(39)
Out[14]:
Season Date HomeTeam AwayTeam FTHG FTAG FTR HTHG HTAG HTR ... HF AF HC AC HY AY HR AR Points MU_goals
0 14-15 8/16/2014 Man United Swansea 1 2 A 0 1 A ... 14 20 4 0 2 4 0 0 0 1
1 14-15 8/24/2014 Sunderland Man United 1 1 D 1 1 D ... 10 15 4 4 0 2 0 0 1 1
2 14-15 8/30/2014 Burnley Man United 0 0 D 0 0 D ... 10 14 3 6 2 2 0 0 1 0
3 14-15 9/14/2014 Man United QPR 4 0 H 3 0 H ... 11 8 3 1 1 0 0 0 3 4
4 14-15 9/21/2014 Leicester Man United 5 3 H 1 2 A ... 11 9 2 4 1 1 0 1 0 3
5 14-15 9/27/2014 Man United West Ham 2 1 H 2 1 H ... 10 12 7 10 1 3 1 0 3 2
6 14-15 10/5/2014 Man United Everton 2 1 H 1 0 H ... 18 11 11 6 4 3 0 0 3 2
7 14-15 10/20/2014 West Brom Man United 2 2 D 1 0 H ... 6 8 0 11 1 2 0 0 1 2
8 14-15 10/26/2014 Man United Chelsea 1 1 D 0 0 D ... 13 14 4 7 3 5 0 1 1 1
9 14-15 11/2/2014 Man City Man United 1 0 H 0 0 D ... 15 9 7 4 3 1 0 1 0 0
10 14-15 11/8/2014 Man United Crystal Palace 1 0 H 0 0 D ... 8 12 11 4 1 3 0 0 3 1
11 14-15 11/22/2014 Arsenal Man United 1 2 A 0 0 D ... 12 8 11 5 2 1 0 0 3 2
12 14-15 11/29/2014 Man United Hull 3 0 H 2 0 H ... 12 12 7 0 2 2 0 0 3 3
13 14-15 12/2/2014 Man United Stoke 2 1 H 1 1 D ... 8 13 2 2 2 4 0 0 3 2
14 14-15 12/8/2014 Southampton Man United 1 2 A 1 1 D ... 12 9 5 1 2 1 0 0 3 2
15 14-15 12/14/2014 Man United Liverpool 3 0 H 2 0 H ... 13 14 2 7 4 3 0 0 3 3
16 14-15 12/20/2014 Aston Villa Man United 1 1 D 1 0 H ... 9 10 3 9 1 1 1 0 1 1
17 14-15 12/26/2014 Man United Newcastle 3 1 H 2 0 H ... 16 9 4 5 1 2 0 0 3 3
18 14-15 12/28/2014 Tottenham Man United 0 0 D 0 0 D ... 11 19 3 6 2 4 0 0 1 0
19 14-15 1/1/2015 Stoke Man United 1 1 D 1 1 D ... 8 12 11 7 0 0 0 0 1 1
20 14-15 1/11/2015 Man United Southampton 0 1 A 0 0 D ... 9 10 3 5 2 3 0 0 0 0
21 14-15 1/17/2015 QPR Man United 0 2 A 0 0 D ... 12 10 3 3 3 2 0 0 3 2
22 14-15 1/31/2015 Man United Leicester 3 1 H 3 0 H ... 8 12 4 3 0 1 0 0 3 3
23 14-15 2/8/2015 West Ham Man United 1 1 D 0 0 D ... 7 13 9 9 2 2 0 1 1 1
24 14-15 2/11/2015 Man United Burnley 3 1 H 2 1 H ... 15 8 5 8 4 3 0 0 3 3
25 14-15 2/21/2015 Swansea Man United 2 1 H 1 1 D ... 6 15 4 10 2 4 0 0 0 1
26 14-15 2/28/2015 Man United Sunderland 2 0 H 0 0 D ... 6 11 13 1 1 1 0 1 3 2
27 14-15 3/4/2015 Newcastle Man United 0 1 A 0 0 D ... 6 14 4 1 1 2 0 0 3 1
28 14-15 3/15/2015 Man United Tottenham 3 0 H 3 0 H ... 12 10 4 2 1 1 0 0 3 3
29 14-15 3/22/2015 Liverpool Man United 1 2 A 0 1 A ... 14 17 2 3 2 2 1 0 3 2
30 14-15 4/4/2015 Man United Aston Villa 3 1 H 1 0 H ... 15 12 10 2 0 1 0 0 3 3
31 14-15 4/12/2015 Man United Man City 4 2 H 2 1 H ... 9 16 1 5 0 3 0 0 3 4
32 14-15 4/18/2015 Chelsea Man United 1 0 H 1 0 H ... 13 11 3 7 7 1 0 0 0 0
33 14-15 4/26/2015 Everton Man United 3 0 H 2 0 H ... 6 10 7 7 0 2 0 0 0 0
34 14-15 5/2/2015 Man United West Brom 0 1 A 0 0 D ... 10 11 9 3 1 0 0 0 0 0
35 14-15 5/9/2015 Crystal Palace Man United 1 2 A 0 1 A ... 13 12 7 6 0 1 0 0 3 2
36 14-15 5/17/2015 Man United Arsenal 1 1 D 1 0 H ... 16 8 5 5 1 0 0 0 1 1
37 14-15 5/24/2015 Hull Man United 0 0 D 0 0 D ... 12 15 8 1 2 2 0 1 1 0
38 15-16 8/8/2015 Man United Tottenham 1 0 H 1 0 H ... 12 12 1 2 2 3 0 0 3 1

39 rows × 24 columns

¶
In [4]:
matches_per_season = df.groupby('Season').size()
print(matches_per_season)
Season
14-15    38
15-16    38
16-17    38
17-18    38
18-19    16
19-20    26
20-21    38
dtype: int64
In [17]:
df.drop(columns = ['HTR'],inplace = True)
In [18]:
df.head()
Out[18]:
Season Date HomeTeam AwayTeam FTHG FTAG FTR HTHG HTAG HS ... HF AF HC AC HY AY HR AR Points MU_goals
0 14-15 8/16/2014 Man United Swansea 1 2 A 0 1 14 ... 14 20 4 0 2 4 0 0 0 1
1 14-15 8/24/2014 Sunderland Man United 1 1 D 1 1 11 ... 10 15 4 4 0 2 0 0 1 1
2 14-15 8/30/2014 Burnley Man United 0 0 D 0 0 9 ... 10 14 3 6 2 2 0 0 1 0
3 14-15 9/14/2014 Man United QPR 4 0 H 3 0 19 ... 11 8 3 1 1 0 0 0 3 4
4 14-15 9/21/2014 Leicester Man United 5 3 H 1 2 15 ... 11 9 2 4 1 1 0 1 0 3

5 rows × 23 columns

In [19]:
df.to_csv("14-21.csv",index=False)
In [34]:
#goals per season
goals_season = df.groupby('Season')['MU_goals'].sum()
print(goals_season)
Season
14-15    62
15-16    49
16-17    54
17-18    68
18-19    28
19-20    38
20-21    73
Name: MU_goals, dtype: int64
In [35]:
#how many goals were scored during each match 
y = goals_season/matches_per_season
print(y)
Season
14-15    1.631579
15-16    1.289474
16-17    1.421053
17-18    1.789474
18-19    1.750000
19-20    1.461538
20-21    1.921053
dtype: float64
In [36]:
# total goals per season scored at home games 
home_games = df[df['HomeTeam']=='Man United']
h_goals = home_games.groupby('Season')['FTHG'].sum()
print(h_goals)
Season
14-15    41
15-16    27
16-17    26
17-18    38
18-19    14
19-20    24
20-21    38
Name: FTHG, dtype: int64
In [37]:
# total goals per season scored at away games 

a_goals = goals_season-h_goals
print(a_goals)
Season
14-15    21
15-16    22
16-17    28
17-18    30
18-19    14
19-20    14
20-21    35
dtype: int64
In [24]:
#Goals scored in the first half

df['halft_goals'] = df.apply(lambda row: row['HTHG'] if row['HomeTeam']=='Man United' else row['HTAG'],axis=1)
first_half_goals = df.groupby('Season')['halft_goals'].sum()
print(first_half_goals)
Season
14-15    33
15-16    21
16-17    28
17-18    29
18-19    15
19-20    20
20-21    28
Name: halft_goals, dtype: int64
In [26]:
#goals scored in the second half

sec_half_goals = goals_season-first_half_goals
print(sec_half_goals)
Season
14-15    29
15-16    28
16-17    26
17-18    39
18-19    13
19-20    18
20-21    45
dtype: int64
In [27]:
df.groupby('Season')[['HY','AY']].sum()
Out[27]:
HY AY
Season
14-15 64 75
15-16 55 65
16-17 67 77
17-18 64 68
18-19 30 34
19-20 54 56
20-21 75 51
In [13]:
#yellow card at home and away

df['mu_y_h'] = np.where(df['HomeTeam']=='Man United',df['HY'],0)
df['mu_y_a'] = np.where(df['AwayTeam']=='Man United',df['AY'],0)

df['mu_r_h'] = np.where(df['HomeTeam']=='Man United',df['HR'],0)
df['mu_r_a'] = np.where(df['AwayTeam']=='Man United',df['AR'],0)

yellow_home =df.groupby('Season')['mu_y_h'].sum()
yellow_away =df.groupby('Season')['mu_y_a'].sum()
red_home =df.groupby('Season')['mu_r_h'].sum()
red_away =df.groupby('Season')['mu_r_a'].sum()
print(f"yellow cards at home\n{yellow_home}\n")
print(f"yellow cards away\n{yellow_away}\n")
print(f"red cards at home\n{red_home}\n")
print(f"red cards away\n{red_away}\n")
yellow cards at home
Season
14-15    31
15-16    29
16-17    36
17-18    29
18-19    14
19-20    22
20-21    36
Name: mu_y_h, dtype: int64

yellow cards away
Season
14-15    33
15-16    36
16-17    40
17-18    35
18-19    19
19-20    28
20-21    28
Name: mu_y_a, dtype: int64

red cards at home
Season
14-15    1
15-16    0
16-17    1
17-18    0
18-19    0
19-20    0
20-21    1
Name: mu_r_h, dtype: int64

red cards away
Season
14-15    4
15-16    1
16-17    1
17-18    1
18-19    2
19-20    0
20-21    0
Name: mu_r_a, dtype: int64

        mu_y_h  mu_y_a
Season                
14-15       38      38
15-16       38      38
16-17       38      38
17-18       38      38
18-19       16      16
19-20       26      26
20-21       38      38
        mu_r_h  mu_r_h
Season                
14-15        1       1
15-16        0       0
16-17        1       1
17-18        0       0
18-19        0       0
19-20        0       0
20-21        1       1

Yellow and red cards per season

In [28]:
df['MU_y'] = df.apply(lambda row: row['HY'] if row['HomeTeam']=='Man United' else row['AY'],axis=1)
df['MU_r'] = df.apply(lambda row: row['HR'] if row['HomeTeam']=='Man United' else row['AR'],axis=1)

yellow_season = df.groupby('Season')['MU_y'].sum()
red_season = df.groupby('Season')['MU_r'].sum()


print(yellow_season)
print(red_season)
Season
14-15    64
15-16    65
16-17    76
17-18    64
18-19    33
19-20    50
20-21    64
Name: MU_y, dtype: int64
Season
14-15    5
15-16    1
16-17    2
17-18    1
18-19    2
19-20    0
20-21    1
Name: MU_r, dtype: int64
In [14]:
#shots taken vs shots on target

df['MU_S'] = df.apply(lambda row: row['HS'] if row['HomeTeam']=='Man United' else row['AS'],axis=1)
df['MU_ST'] = df.apply(lambda row: row['HST'] if row['HomeTeam']=='Man United' else row['AST'],axis=1)

shots_season = df.groupby('Season')['MU_S'].sum()
shots_tar_season = df.groupby('Season')['MU_ST'].sum()


print(f"total shots taken \n{shots_season}\n")
print(f"shots on target \n{shots_tar_season}\n")

perc = (shots_tar_season/shots_season)*100
print(f"overall percentage of shorts on target /n{perc}%")
total shots taken 
Season
14-15    509
15-16    430
16-17    589
17-18    513
18-19    210
19-20    381
20-21    521
Name: MU_S, dtype: int64

shots on target 
Season
14-15    179
15-16    143
16-17    210
17-18    181
18-19    100
19-20    146
20-21    212
Name: MU_ST, dtype: int64

overall percentage of shorts on target /nSeason
14-15    35.166994
15-16    33.255814
16-17    35.653650
17-18    35.282651
18-19    47.619048
19-20    38.320210
20-21    40.690979
dtype: float64%
In [ ]:
 

VISUALIZATIONS

In [164]:
#TOTAL GOALS OF EACH SEASON OF MAN UNITED

plt.plot(goals_season)
plt.grid(axis='y')
plt.title('Man United Goals Per Season')
plt.xlabel('Seasons')
plt.ylabel('Goals')
Out[164]:
Text(0, 0.5, 'Goals')
No description has been provided for this image

Based on the graph the 20-21 season was the season with the highest goals scored followed by the 17-18 season and the 18-19 season was the season with the least goals scored

In [ ]:
 
In [158]:
#GOALS SCORED IN FIRST HALF VS SSECOND HALF FOR EACH SEASON


seasons = first_half_goals.index
x = range(len(seasons))  

plt.bar(x, first_half_goals, width=0.4, label='First Half Goals', align='center')
plt.bar([i+0.4 for i in x], sec_half_goals, width=0.4, label='Second Half Goals', align='center')

plt.xticks([i + 0.2 for i in x], seasons, rotation=45)
plt.ylabel('Goals')
plt.title('Man United: First Half vs Second Half Goals per Season')
plt.legend()
plt.tight_layout()
plt.show()
No description has been provided for this image

In four out of the seven seasons, majority of the goals were scored in the first half, whereas for the remaining three, majority of the goals were scored in the second half. The 20-21 season showed a huge number of goals came from the second half rather than the first

In [42]:
!pip install plotly
Collecting plotly
  Downloading plotly-6.0.1-py3-none-any.whl.metadata (6.7 kB)
Collecting narwhals>=1.15.1 (from plotly)
  Downloading narwhals-1.38.0-py3-none-any.whl.metadata (9.3 kB)
Requirement already satisfied: packaging in /Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages (from plotly) (24.2)
Downloading plotly-6.0.1-py3-none-any.whl (14.8 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.8/14.8 MB 137.2 kB/s eta 0:00:0000:0100:03
Downloading narwhals-1.38.0-py3-none-any.whl (338 kB)
Installing collected packages: narwhals, plotly
Successfully installed narwhals-1.38.0 plotly-6.0.1

[notice] A new release of pip is available: 25.0.1 -> 25.1.1
[notice] To update, run: pip install --upgrade pip
In [20]:
 
In [49]:
#home goals vs away goals per season.

import plotly.express as px
import pandas as pd

# Make a new DataFrame for heatmap
df_heat = pd.DataFrame({
    'Season': list(h_goals.index) * 2,
    'Type': ['Home'] * len(h_goals) + ['Away'] * len(h_goals),
    'Goals': list(h_goals.values) + list(a_goals.values)
})

fig = px.density_heatmap(df_heat, x="Season", y="Type", z="Goals",
                         color_continuous_scale="Viridis", text_auto=True)
fig.show()

In the 16-17 season, Man United managed to score more away goals than home goals. In the 19-20 season same number of goals were scored at away and home games.

In [43]:
#shots vs shots on target

import plotly.graph_objects as go


# Compute the difference: shots that were NOT on target
shots_off_target = shots_season - shots_tar_season


fig = go.Figure(data=[
    go.Bar(name='ON Target', y=shots_season.index, x=shots_tar_season, orientation='h'),
    go.Bar(name='OFF Target', y=shots_season.index, x=shots_off_target, orientation='h')
])


# Customize layout
fig.update_layout(
    barmode='stack',
    title='Man United: Total Shots vs Shots on Target per Season',
    xaxis_title='Season',
    yaxis_title='Number of Shots',
    template='plotly_white'
)

fig.show()

The 2020–21 season had the highest number of shots on target which is the reason why it was the season with the most goals overall compared to others.

In [44]:
import plotly.express as px

fig = px.pie(perc,values=perc.values,names=perc.index,color_discrete_sequence=px.colors.sequential.Viridis)
fig.show()

In 2020–21, teams had the most shots on target in total. But when we look at accuracy (shots on target as a percentage of total shots), 2018–19 had the best. This means that in 2018–19, teams were more efficient with their shooting, even if they took fewer shots overall.

In [46]:
dot_data = pd.DataFrame({
    'Season': yellow_home.index,
    'Yellow (Home)': yellow_home.values,
    'Yellow (Away)': yellow_away.values,
    'Red (Home)': red_home.values,
    'Red (Away)': red_away.values
})


dot_data_melted = dot_data.melt(id_vars='Season', var_name='Card Type', value_name='Count')

# Dot Plot
fig = px.scatter(
    dot_data_melted,
    x='Count',
    y='Season',
    color='Card Type',
    symbol='Card Type',
    title='Man United: Yellow and Red Cards (Home vs Away)',
    template='plotly_white'
)

fig.update_traces(marker=dict(size=12))
fig.show()

In most seasons, Manc United received more yellow and red cards in away games, but in the 2020–21 season, the trend reversed—they got more cards at home.

SUMMARY In the 2020–21 season, there were the most goals and most shots on target, but in 2018–19, players were better at scoring with fewer shots. Usually, more goals came in the first half and more cards were given in away games, but 2020–21 was the opposite. Man United mostly scored more goals at home, except in 2016–17 (more away goals) and 2019–20 (same at home and away).